normalized kernel
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The submission describes a convex deep learning formulation that leverages a number of key ideas. First, a training objective is proposed that explicitly includes the outputs of hidden layers as variables to be inferred via optimization. These are linked to linear responses via a loss function, and the net objective is the sum of these loss functions across the layers, plus some regularization terms. Next, a number of changes of variables are performed in order to reparameterize the objective into a convex form, heavily leveraging the representer theorem and the idea of value regularization. We are left with a convex objective in terms of three different matrices (per layer) to optimize. In particular, one of these matrices is a nonparametric'normalized output kernel' matrix, which takes the place of optimizing over the hidden layer outputs directly; however, this leads to a transductive method where we must simultaneously solve the optimization for training and test inputs.
Convex Deep Learning via Normalized Kernels
Deep learning has been a long standing pursuit in machine learning, which until recently was hampered by unreliable training methods before the discovery of improved heuristics for embedded layer training. A complementary research strategy is to develop alternative modeling architectures that admit efficient training methods while expanding the range of representable structures toward deep models. In this paper, we develop a new architecture for nested nonlinearities that allows arbitrarily deep compositions to be trained to global optimality. The approach admits both parametric and nonparametric forms through the use of normalized kernels to represent each latent layer. The outcome is a fully convex formulation that is able to capture compositions of trainable nonlinear layers to arbitrary depth.
Convex Deep Learning via Normalized Kernels
Özlem Aslan, Xinhua Zhang, Dale Schuurmans
Deep learning has been a long standing pursuit in machine learning, which until recently was hampered by unreliable training methods before the discovery of improved heuristics for embedded layer training. A complementary research strategy is to develop alternative modeling architectures that admit efficient training methods while expanding the range of representable structures toward deep models. In this paper, we develop a new architecture for nested nonlinearities that allows arbitrarily deep compositions to be trained to global optimality. The approach admits both parametric and nonparametric forms through the use of normalized kernels to represent each latent layer. The outcome is a fully convex formulation that is able to capture compositions of trainable nonlinear layers to arbitrary depth.
- North America > Canada > Alberta (0.14)
- North America > Canada > Ontario > Toronto (0.04)
Convex Deep Learning via Normalized Kernels
Deep learning has been a long standing pursuit in machine learning, which until recently was hampered by unreliable training methods before the discovery of improved heuristics for embedded layer training. A complementary research strategy is to develop alternative modeling architectures that admit efficient training methods while expanding the range of representable structures toward deep models. In this paper, we develop a new architecture for nested nonlinearities that allows arbitrarily deep compositions to be trained to global optimality. The approach admits both parametric and nonparametric forms through the use of normalized kernels to represent each latent layer. The outcome is a fully convex formulation that is able to capture compositions of trainable nonlinear layers to arbitrary depth.
- North America > Canada > Alberta (0.14)
- North America > Canada > Ontario > Toronto (0.04)
Convex Deep Learning via Normalized Kernels
Aslan, Özlem, Zhang, Xinhua, Schuurmans, Dale
Deep learning has been a long standing pursuit in machine learning, which until recently was hampered by unreliable training methods before the discovery of improved heuristics for embedded layer training. A complementary research strategy is to develop alternative modeling architectures that admit efficient training methods while expanding the range of representable structures toward deep models. In this paper, we develop a new architecture for nested nonlinearities that allows arbitrarily deep compositions to be trained to global optimality. The approach admits both parametric and nonparametric forms through the use of normalized kernels to represent each latent layer. The outcome is a fully convex formulation that is able to capture compositions of trainable nonlinear layers to arbitrary depth.
Exchangeability and Kernel Invariance in Trained MLPs
Tsuchida, Russell, Roosta, Fred, Gallagher, Marcus
Despite the widespread usage of deep learning in applications (Mnih et al., 2015; Kalchbrenner et al., 2017; Silver et al., 2017; van den Oord et al., 2018), current theoretical understanding of deep networks continues to lag behind the pursued engineering outcomes. Recent theoretical contributions have considered networks in their randomized initial state, or made strong assumptions about the parameters or data during training. For example, Cho & Saul (2009); Daniely et al. (2016); Bach (2017); Tsuchida et al. (2018) analyze the kernels of neural networks with random IID distributions. Insightful analysis connecting signal propagation in deep networks to chaos have made similar assumptions (Poole et al., 2016; Raghu et al., 2017). Clearly the assumption of random IID weights is only valid when the network is in its random initial state.
- North America > Canada > Ontario > Toronto (0.14)
- Oceania > Australia > Queensland (0.04)
Training CNNs With Normalized Kernels
Ozay, Mete (Tohoku University) | Okatani, Takayuki (Tohoku University and RIKEN Center for AIP)
Several methods of normalizing convolution kernels have been proposed in the literature to train convolutional neural networks (CNNs), and have shown some success. However, our understanding of these methods has lagged behind their success in application; there are a lot of open questions, such as why a certain type of kernel normalization is effective and what type of normalization should be employed for each (e.g., higher or lower) layer of a CNN. As the first step towards answering these questions, we propose a framework that enables us to use a variety of kernel normalization methods at any layer of a CNN. A naive integration of kernel normalization with a general optimization method, such as SGD, often entails instability while updating parameters. Thus, existing methods employ ad-hoc procedures to empirically assure convergence. In this study, we pose estimation of convolution kernels under normalization constraints as constraint-free optimization on kernel submanifolds that are identified by the employed constraints. Note that naive application of the established optimization methods for matrix manifolds to the aforementioned problems is not feasible because of the hierarchical nature of CNNs. To this end, we propose an algorithm for optimization on kernel manifolds in CNNs by appropriate scaling of the space of kernels based on structure of CNNs and statistics of data. We theoretically prove that the proposed algorithm has assurance of almost sure convergence to a solution at single minimum. Our experimental results show that the proposed method can successfully train popular CNN models using several different types of kernel normalization methods. Moreover, they show that the proposed method improves classification performance of baseline CNNs, and provides state-of-the-art performance for major image classification benchmarks.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > Maryland > Baltimore (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Honshū > Tōhoku > Miyagi Prefecture > Sendai (0.04)
Convex Deep Learning via Normalized Kernels
Aslan, Özlem, Zhang, Xinhua, Schuurmans, Dale
Deep learning has been a long standing pursuit in machine learning, which until recently was hampered by unreliable training methods before the discovery of improved heuristics for embedded layer training. A complementary research strategy is to develop alternative modeling architectures that admit efficient training methods while expanding the range of representable structures toward deep models. In this paper, we develop a new architecture for nested nonlinearities that allows arbitrarily deep compositions to be trained to global optimality. The approach admits both parametric and nonparametric forms through the use of normalized kernels to represent each latent layer. The outcome is a fully convex formulation that is able to capture compositions of trainable nonlinear layers to arbitrary depth.
- North America > Canada > Alberta (0.14)
- North America > Canada > Ontario > Toronto (0.04)